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Abstract. A program calculating Bhabha scattering at high energy collid- 
ers is considered for porting to the EGEE Grid infrastructure. The program 
code, which is a result of the AITALC project, is ported by using a master- 
worker operating scheme. The job submission, execution and monitoring 
are implemented using the Grid Way metascheduler. The unattended ex- 
ecution of jobs turned out to be complete and rather efficient, even when 
pre-knowledge of the grid is absent. While the batch of jobs remains or- 
ganized at the user's side, the actual computation was carried out within 
the phenogrid virtual organization. The scientific results support the use 
of the small angle Bhabha scattering for the luminosity measurements of 
the International Linear Collider project. 



1 Introduction 

The International Linear Collider (ILC) is an electron-positron accelerator planned 
to supersede the Large Hadron Collider (LHC) and lead the high energy physics 
research in the nearby decades. Still, the proposal and supporting groups for the 
ILC are awaiting the first signals of LHC to complement their goals to the ex- 
pected discoveries from the proton-proton collider. Thousands of scientists and 
engineers firmly believe [112] on the advantage coming from a cleaner environment 
resulting from the electron-positron collisions at the ILC. This translates into a 
clearer signal to background ratio than in the LHC due to the absence of plethora 
of hadronic subproducts. Therefore, this kind of accelerator could much precisely 
determine important parameters of the Standard Model of particle physics and 
thus lead to tighter model constraints and even discoveries "by precision" . Some 
high-precision measurements in history, like planetary motion, the speed of light 
or, more recently, deep inelastic scattering, led to remarkable advances in our un- 
derstanding of astronomy, special relativity and quantum chromodynamics (QCD) 
respectively. They remind us the importance of such methodology. 

The calculations involved in an accurate determination of any physical observ- 
able are, within perturbation theory in quantum field theory, rather cumbersome. 



Fortunately, computing resources have grown fast enough during the last decades 
to accomplish these calculations despite of their increasing complexity. Automated 
software tools have arosen in the last thirty years to provide systematic and reliable 
answer for these predictions. 

In this context, the EGEE project^ provides a computing infrastructure where 
scientists and engineers perform numerous studies and tests in order to provide an 
efficient distributed architecture for Grid computing. 

Along this paper we describe the scientific goals (Sec. ^ and the methodology 
we used for adapting a code for theoretical predictions to run properly on to the 
Grid (Sec. [3]). Analyses on the execution of jobs (Sec. |4j and scientific results 
(Sec. [5J base the case study. Finally a look at further possibilities and conclusions 
finishes this contribution on Sec. [HI 

2 Scientific scope 

Nowadays, high energy physics applications intensively use Grid resources during 
data processing and analyses from LHC. Theoreticians also require heavy compu- 
tational tasks to match the same level of accuracy achieved by the experimental 
measurements. In a close future, after the LHC experience, it could be very well 
stablished that Grid technologies are a standard procedure in order to achieve the 
permill level of uncertainty expected for the ILC. 

In this article we will consider Bhabha scattering (i.e., the reaction of an 
electron-positron pair into themselves) 

e~e + — > e~e + , (1) 

as our target process. 

One of the reasons to consider Bhabha scattering is the high cross section result- 
ing at small angle due to the small deflection coming mainly from electromagnetic 
interactions. The resulting large amount of statistics within a well understood 
model allows for luminosity calibration. Such technique has been used at high en- 
ergy colliders like LEP and SLD. Additionally, large angle Bhabha scattering has 
been also used at low energy &-quark factories like Belle or BaBar. 

On the computational side, AITALC [3] is a useful tool for automating some 
calculations needed by theoreticians and phcnomenologists in high energy physics. 
It produces numerical programs directly from the symbolic representation of the 
process, by using the so-called Feynman rules [3] and generating the complete ana- 
lytic expressions for the respective Feynman diagrams within perturbation theory. 
Since the package is able to produce code for scattering of 2 — > 2 fermions, it was 
suited to deliver the necessary routines to compute integrated cross sections for 
Bhabha scattering, including the first order corrections coming from electroweak 
quantum loop effects and soft-photon radiation. 

The goal was to scan the full range of centre-of-mass energies, from the 10 
GeV at the B-factories, to the 1000 GeV expected to be achieved at the second 
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running phase of the ILC. This range will be covered by 2048 data-points from 
both, tree-level (zeroth order) and first order corrected integrated cross sections. 
This coverage permits to quantify the impact of the quantum loops effects and 
relays in an complementary simulation of hard-photon effects [5] by Monte Carlo 
programs [6]. Moreover two- loop photonic corrections [71819] also play a role in 
order to complete the full theoretical prediction. 

3 Porting the application 
3.1 Preliminaries 

Whilst AITALC is a tool integrating three other independent packages; FORM [TO] , 
DIANA [TT] and LOOPTOOLS the resulting code which brings the numerics to 
the end-user is built as a dynamically linked FORTRAN executable. 

Our primary intention was to port to the Grid the complete process, the gen- 
erating tool and the code execution. 






Make 










Fortran submake 




Fig. 1. Job work-flow for a typical process study with AITALC. Squares enclose 
those tasks with an internal structure suitable for some kind of parallelization. 
Double squared box shows the most suitable part to be ported to the Grid. 



Fig. [T] describes the work-flow of the tool AITALC. There is a main make process 
which organizes the tree-level (0th order submake), the loop corrections (1st order 
submake) and the last numerical (FORTRAN submake) part through automated 
Makefiles. These three processes depend on each other as depicted, so the 0th 
and 1st order could be run in parallel. Because the complexity of the internal 



calculations, it turns out that the 1st order demands much more resources than 
the Oth order. Still there are some internal parts of the branches, the dynamical 
ones under form (form Dyn.), which could in principle also be parallelized. The 
last block which evaluates the numerics (FORTRAN run) strongly depends on the 
user's needs. 

There are two main reasons why we considered only the final task to be ported 
to the Grid: 

— Independence. AITALC runs natively on Unix machines, but even if it is not 
uncommon to find dedicated machines hosting FORTRAN compilers, FORM, 
DIANA and LOOPTOOLS, are very specialized software packages whose licences 
and distribution channels avoid out-of-the-box availability in standard Linux 
distributions. For this reason, AITALC supplies an installation script sorting 
out these inconveniences, but is unfeasible to provide a single executable and 
the whole tool with these three packages should be installed altogether in each 
running node. The later scenario immediately conflicts with standardized user 
permissions and node specifications. 

— Timings. The production of the FORTRAN code does not take much time in a 
single machine for an example process. Our full-massive electroweak Bhabha 
scattering, being the largest process AITALC is able to generate, took not much 
more in our testing machine, just a few minutes to finish as shown in Tab. [1] 
Because the total time to evaluate 2048 data-points is approximately two or- 
ders of magnitude larger (e.g. Tabs. H]©, we might neglect the time to create 
the FORTRAN code. 



Task 


Time [s] 


Oth order submake 
1st order submake 
FORTRAN submake 


12 
381 
57 



Table 1. Detailed running time for each building block of AITALC when producing 
the FORTRAN code for Bhabha scattering. 



3.2 Master-worker scheme 

The running profile of the application let the master-worker scheme being a very 
suitable one in order to achieve a balance between user intervention and execution 
performance. A scheme of the final application is depicted in the Fig. [2] It shows 
the work-flow consisting of a major master piece which decides how heavy the 
jobs are going to be (according to the user's instructions) and a worker part being 
transferred and remotely run. The master communicates with the Grid via the 
gwsubmit command of the Grid Way metascheduler [Tj5]. Grid Way metasched- 
uler offers a clean and user-friendly interface to submit jobs into a Grid middleware 
like GLOBUS or gLite. 




Fig. 2. Grid job work-flow. The GridWay metascheduler acts as middleware sta- 
blishing communication between master and worker. 



Being Grid Way capable to perform unattended job migration, recovery and 
rescheduling |14j . it was extremely useful for getting, with the generic and simplest 
configuration, a complete set of jobs being delivered for execution at different com- 
puting elements. Thus a deeper study of the queue and hardware characteristics 
of each cluster is not required. 

Converting the FORTRAN program into a worker was a minor issue which is 
nevertheless worth to be mentioned since it is a common problem met by the 
scientific community. We chose to subdivide the worker into two parts: 

— A simple main program, written in C++, accepting running parameters as 
arguments passed through command line interface. This main program calls, 
for each parameter configuration, a unique FORTRAN subroutine. 

— The FORTRAN subroutine, which is an adaptation of the unported main pro- 
gram commenting out the settings for initial parameters. These are, according 
to AITALC description: nsqrtsman, minsqrtsman, maxsqrtsman, setlimcost, 
setf racomega. 

Moreover, it was also required to ensure 32 bits compatibility and static linkage 
(usually flags -m32 — static in compilers). In such a way, the final executable 
was suited to be run in any working node without lacking any external libraries. 
The size of this executable (5.8MB) did not create transfer time bottlenecks. 

4 Execution analysis 

The scanning of data-points were planned in batches of jobs. Each of them con- 
tained specific instructions to generate automatically job templates with specific 
configuration for every job. This configuration had to do only with the definition 
of parameter range. The batches had equal amount of data to process, so we can 
consider the blocks equivalents to one another: 



— 1st block: 4 jobs containing 512 data-points each, 

— 2nd block: 8 jobs containing 256 data-points each, 

— 3rd block: 16 jobs containing 128 data-points each, 

— 4th block: 32 jobs containing 64 data-points each. 

The evolution of a successful job is as follows: First, after a computing clement 
has been chosen and accepted, the waiting time for a working node is denoted as 
pending. There is some stage-in time required to transfer data to the computing 
element. Then, the job enters in its execution stage. Finally a stage-out time is 
needed to transfer back the output and status. Moreover we have also to add some 
overhead due to job request processing and state change notifications. Occasionally 
the job gets migrated after a suspensions ime configured by default in GridWay 
to another computing element, in that case it appears in the graphics as waiting. 
Regarding totals, we noted by computing time the whole time summed up for the 
different machines and with the keyword human, the waiting time from batch start 
to the end. 



alTALC jobs time analysis at phenogrid 

60 jobs, total of 18.55 distributed hours 
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Fig. 3. Overall job-stage time for each computing element. The vertical order of 
bars is the same as shown in the legend. 



All the jobs were submitted to the infrastructure phenogrid H, which takes 
part in the EGEE project under the phenomcnology-focussed virtual organization 
pheno. The timing analysis by cluster is given in Tab. [5] and graphically shown 
in Fig. [3] Three of the six computing elements worked properly and output data 
files were produced at regular rate. Other (IcgceO . shef . ac .uk) seemed to have a 
misconfiguration [15] and therefore left our jobs pending for too long and therefore 
being migrated automatically by Grid Way to most capable hosts. With the other 
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Computing Element 


CPU 


Nodes 


Time performance 


N 


FQDN 


Type 


MHz 


Total 


Pend. 


St. -in 


Exec. 


St. 


-out 


Overh. 


ce02.dur.scotgrid.ac.uk 


Xeon 


2667 


672 


14074 


127 


14670 







78 


lcgceO.shef.ac.uk 


Opteron 


2400 


190 


12673 













27 


lcgce02.gridpp.rl.ac.uk 


P.III 


1001 


2525 


3948 


104 


18447 




41 


243 


svr026.gla.scotgrid.ac.uk 


Opteron 


2200 


1896 


306 


66 


1949 







11 


cel.pp.rhul.ac.uk* 


P.IV 


1000 


136 












149 


svr021.gia.scotgrid.ac.uk* 


Opteron 


1896 


2200 












18 



These computing elements reiteratively returned job-callback errors. 



Table 2. Computing elements used for scheduling belonging to the pheno Virtual 
Organization at the EGEE infrastructure. Sums of short times rounded up to a 
second may be shown as zero, even if there was some accumulated latency. 



Block 


Time performance [s 




Jobs x data-points 


Pending 


Stage- in 


Execution 


Stage-out 


Overhead 


Total 


4 x 512 


3980 


11 


8648 


5 


30 


12674 


8 x 256 


9484 


42 


8939 


3 


39 


18507 


16 x 128 


5989 


86 


8728 


10 


82 


14895 


32 x 64 


11548 


158 


8751 


23 


208 


20688 




Pend. /job 


St. -in/job 


Exec. /job 


St .-out /job 


Overh. /job 


Human 


4 x 512 


995.00 


2.75 


2162.00 


1.25 


7.50 


3348 


8 x 256 


1185.5 


5.25 


1117.38 


0.38 


4.88 


3131 


16 x 128 


374.31 


5.38 


545.50 


0.63 


5.13 


1616 


32 x 64 


360.88 


4.94 


273.47 


0.72 


6.50 


1328 



Table 3. Time performance by block distribution of jobs. Precision is limited by 
rounded up of invidual timings to the second. 



two computing elements (eel .pp . rhul .ac.uk and svr021 . gla. scotgrid. ac .uk) 
we found some errors after submitting our jobs that could be related to the local 
resource management system (LRMS), so Grid Way applied a temporal banning 
policy to them to avoid unsuccessful retries. This is just one of the already imple- 
mented mechanisms to enhance performance without user intervention. 

Having a look to Tab. [3] the Figs.[4][7j we can appreciate the fair behaviour of 
the Grid computing. The more jobs contains the block, less human time waits the 
end-user, so parallclization works as expected. Nonetheless for a short amount of 
jobs as we have in this study, we cannot expect a linear statistical reduction due to 
the timeout of the pending time which some computing elements introduce. This 
timeout is configured in Grid Way via the SUSPENSI0N_TIMEDUT parameter, and 
the end-user might include it in his job template. This setting, as well as many 
other performance enhance strategies, lie outside of this paper's scope. 
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Fig. 4. Batch of 4 jobs computing 512 data-points each at phcnogrid. The evolution 
of each job is split into separated bars every time it was migrated from computing 
element. 
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■ OVERHEAD 

■ STAGE-OUT 

■ EXECUTION 

■ STAGE-IN 
□ PENDING 
Q WAITING 



3600 

Fig. 5. Batch of 8 jobs computing 256 data-points each at phenogrid. The evolution 
of each job is split into separated bars every time it was migrated from computing 
element. 



5 Scientific results 

The correct execution of all the jobs let us compose the complete scan of the 
integrated cross section for Bhabha scattering. Two scans were performed with 
crossed exponential stepping to ensure that a late/failed job could still be inter- 
polated from the rest without much loss of precision. The following two configura- 
tions for different scattering angle 9 and maximum soft photon energy E™^£ were 
considered: 

- Large angle: -0.9 < cos (9 < 0.9, E™™ = 0.1 y/s 

- Small angle: 25 mrad < 9 < 90 mrad, E^ = 0.2y/s 

being yfs the centre-of-mass energy. 

We can observe the resonance induced by the gauge neutral Z-boson in Fig. [H] 
Here different symbols indicate different job identifiers, giving us therefore the 
complete result only when all the jobs are finished. The electroweak corrections 
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Fig. 6. Batch of 16 jobs computing 128 data-points each at phenogrid. The evo- 
lution of each job is split into separated bars every time it was migrated from 
computing clement. 
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32 jobs, 64 datapoints each 
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Fig. 7. Batch of 32 jobs computing 64 data-points each at phenogrid. The evolution 
of each job is split into separated bars every time it was migrated from computing 
element. 



are in this range quite important at percent level, since the angular cross section 
is stopped at the angles close to collinearity. 

Fig. [9] depletes the second configuration for small angle scattering in the for- 
ward region and the relative importance of the first order (0(a)) perturbative 
corrections. Here the resonance disappears due to the large contributions coming 
from the divergent Feynman diagram exchanging a photon at the collinear case. 
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Fig. 8. Bhabha scattering around the Z-pole. Tiny dots stand for the tree-level 
while shaped symbols do it for the electroweak 1-loop plus soft photon corrections 
with -E^* = 0.ly/s. Each of the sixteen different symbols matches a different job 
identifier, being therefore the scanning of data-points computed in a distributed 
environment. 



In this case the relative corrections are small, about the permill order: O(10~ 3 ). 
As mentioned before, this study should be supplemented with the hard-photon 
bremmstrahlung and second order perturbative corrections in order to specify the 
amount of uncertainty coming from the theoretical prediction. 

6 Conclusions and outlook 

Porting the application was successful with the help of the Grid Way metasched- 
uler. A scheme of master/worker was developed where the master remain at user's 
side, giving instructions through Grid Way about how the submission should be 
partitioned and managed. Because the code created by AITALC is a dynamically 
linked executable, a few modifications to the original code were required in order 
to safely run as a worker under different configurations at every Grid node. Dif- 
ferent workload balances were studied for the sake of performance, finding out a 
reasonable default behaviour without any configuration tweaking. Still the time of 
waiting for a working node limitcs improvements when making the jobs smaller. 
Therefore submitting jobs which require less than a few minutes to complete do 
not increase performance because Grid latencies start playing a significant role. 

Future work includes fine-graining of data results through feedback into the 
master and standardization of porting of similar codes developed without parallel 
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Fig. 9. Integrated cross section between 25 mrad < 6 < 90 mrad for Bhabha scat- 
tering (above). The percentage of the one-loop corrected cross section with respect 
to the tree-level is also shown (below), with a maximum of 0.33% at E = n%z- The 
maximum of soft photon energy was taken to be = 0.2^/s. 



execution in mind. An interesting possibility would be the implementation of more 
advanced job managers which dynamically obtain feedback from finished jobs and 
try strategies according to well defined policies. This could minimize waiting time, 
avoiding strictly failing nodes or avoiding resubmission of jobs being processed 
slowly. 
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